Reinforcement learning (RL) has shown great promise with algorithms learning in environments with large state and action spaces purely from scalar reward signals. A crucial challenge for current deep RL algorithms is that they require a tremendous amount of environment interactions for learning. This can be infeasible in situations where such interactions are expensive; such as in robotics. Offline RL algorithms try to address this issue by bootstrapping the learning process from existing logged data without needing to interact with the environment from the very beginning. While online RL algorithms are typically evaluated as a function of the number of environment interactions, there exists no single established protocol for evaluating offline RL methods.In this paper, we propose a sequential approach to evaluate offline RL algorithms as a function of the training set size and thus by their data efficiency. Sequential evaluation provides valuable insights into the data efficiency of the learning process and the robustness of algorithms to distribution changes in the dataset while also harmonizing the visualization of the offline and online learning phases. Our approach is generally applicable and easy to implement. We compare several existing offline RL algorithms using this approach and present insights from a variety of tasks and offline datasets.
translated by 谷歌翻译
Humans have perfected the art of learning from multiple modalities through sensory organs. Despite their impressive predictive performance on a single modality, neural networks cannot reach human level accuracy with respect to multiple modalities. This is a particularly challenging task due to variations in the structure of respective modalities. Conditional Batch Normalization (CBN) is a popular method that was proposed to learn contextual features to aid deep learning tasks. This technique uses auxiliary data to improve representational power by learning affine transformations for convolutional neural networks. Despite the boost in performance observed by using CBN layers, our work reveals that the visual features learned by introducing auxiliary data via CBN deteriorates. We perform comprehensive experiments to evaluate the brittleness of CBN networks to various datasets, suggesting that learning from visual features alone could often be superior for generalization. We evaluate CBN models on natural images for bird classification and histology images for cancer type classification. We observe that the CBN network learns close to no visual features on the bird classification dataset and partial visual features on the histology dataset. Our extensive experiments reveal that CBN may encourage shortcut learning between the auxiliary data and labels.
translated by 谷歌翻译
Current pre-trained language models rely on large datasets for achieving state-of-the-art performance. However, past research has shown that not all examples in a dataset are equally important during training. In fact, it is sometimes possible to prune a considerable fraction of the training set while maintaining the test performance. Established on standard vision benchmarks, two gradient-based scoring metrics for finding important examples are GraNd and its estimated version, EL2N. In this work, we employ these two metrics for the first time in NLP. We demonstrate that these metrics need to be computed after at least one epoch of fine-tuning and they are not reliable in early steps. Furthermore, we show that by pruning a small portion of the examples with the highest GraNd/EL2N scores, we can not only preserve the test accuracy, but also surpass it. This paper details adjustments and implementation choices which enable GraNd and EL2N to be applied to NLP.
translated by 谷歌翻译
强化学习(RL)的成功在很大程度上取决于从环境观察中学习强大表示的能力。在大多数情况下,根据价值功能的变化,在各州之间纯粹通过强化学习损失所学的表示形式可能会有很大差异。但是,所学的表示形式不必非常具体地针对手头的任务。仅依靠RL目标可能会产生在连续的时间步骤中变化很大的表示形式。此外,由于RL损失的目标变化,因此所学的表示将取决于当前价值/策略的良好。因此,从主要任务中解开表示形式将使他们更多地专注于捕获可以改善概括的过渡动态。为此,我们提出了局部约束的表示,辅助损失迫使国家表示由邻近状态的表示可以预测。这不仅鼓励表示形式受到价值/政策学习的驱动,还可以自我监督的学习来驱动,这会限制表示表示的变化太快。我们在几个已知的基准上评估了所提出的方法,并观察到强劲的性能。尤其是在连续控制任务中,我们的实验比强基线显示出显着的优势。
translated by 谷歌翻译
大多数强化学习算法都利用了经验重播缓冲液,以反复对代理商过去观察到的样本进行训练。这样可以防止灾难性的遗忘,但是仅仅对每个样本都分配了同等的重要性是一种天真的策略。在本文中,我们提出了一种根据样本可以从样本中学到多少样本确定样本优先级的方法。我们将样本的学习能力定义为随着时间的推移,与该样品相关的训练损失的稳定减少。我们开发了一种算法,以优先考虑具有较高学习能力的样本,同时将优先级较低,为那些难以学习的样本,通常是由噪声或随机性引起的。我们从经验上表明,我们的方法比随机抽样更强大,而且比仅在训练损失方面优先排序更好,即时间差损失,这是在香草优先的经验重播中使用的。
translated by 谷歌翻译
不依赖虚假相关性的学习预测因素涉及建立因果关系。但是,学习这样的表示非常具有挑战性。因此,我们制定了从高维数据中学习因果表示的问题,并通过合成数据研究因果恢复。这项工作引入了贝叶斯因果发现的潜在变量解码器模型BCD,并在轻度监督和无监督的环境中进行实验。我们提出了一系列合成实验,以表征因果发现的重要因素,并表明将已知的干预靶标用作标签有助于无监督的贝叶斯推断,对线性高斯添加噪声潜在结构性因果模型的结构和参数。
translated by 谷歌翻译
在世界上一些地区的蝗虫侵犯,包括非洲,亚洲和中东已成为一个有关人们对数百万人的生命的问题。在这方面,已经尝试通过使用卫星和传感器的蝗虫育种区域的检测和监测来解决或降低该问题的严重程度,或使用化学物质以防止形成群体。然而,这些方法尚未能够抑制蝗虫的出现和集体行为。另一方面,在形成之前预测蝗虫群的位置的能力可以帮助人们更有效地获得侵扰问题。在这里,我们使用机器学习来使用联合国食品和农业组织公布的可用数据来预测蝗虫群的位置。这些数据包括观察到的群体以及环境信息,包括土壤水分和植被密度。所获得的结果表明,我们所提出的模型可以成功,并且具有合理的精度,预测蝗虫群的位置,以及使用密度概念的可能损坏程度。
translated by 谷歌翻译
Neural networks trained on datasets such as ImageNet have led to major advances in visual object classification. One obstacle that prevents networks from reasoning more deeply about complex scenes and situations, and from integrating visual knowledge with natural language, like humans do, is their lack of common sense knowledge about the physical world. Videos, unlike still images, contain a wealth of detailed information about the physical world. However, most labelled video datasets represent high-level concepts rather than detailed physical aspects about actions and scenes. In this work, we describe our ongoing collection of the "something-something" database of video prediction tasks whose solutions require a common sense understanding of the depicted situation. The database currently contains more than 100,000 videos across 174 classes, which are defined as caption-templates. We also describe the challenges in crowd-sourcing this data at scale.
translated by 谷歌翻译
While depth tends to improve network performances, it also makes gradient-based training more difficult since deeper networks tend to be more non-linear. The recently proposed knowledge distillation approach is aimed at obtaining small and fast-to-execute models, and it has shown that a student network could imitate the soft output of a larger teacher network or ensemble of networks. In this paper, we extend this idea to allow the training of a student that is deeper and thinner than the teacher, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student. Because the student intermediate hidden layer will generally be smaller than the teacher's intermediate hidden layer, additional parameters are introduced to map the student hidden layer to the prediction of the teacher hidden layer. This allows one to train deeper students that can generalize better or run faster, a trade-off that is controlled by the chosen student capacity. For example, on CIFAR-10, a deep student network with almost 10.4 times less parameters outperforms a larger, state-of-the-art teacher network.
translated by 谷歌翻译
Increasingly taking place in online spaces, modern political conversations are typically perceived to be unproductively affirming -- siloed in so called ``echo chambers'' of exclusively like-minded discussants. Yet, to date we lack sufficient means to measure viewpoint diversity in conversations. To this end, in this paper, we operationalize two viewpoint metrics proposed for recommender systems and adapt them to the context of social media conversations. This is the first study to apply these two metrics (Representation and Fragmentation) to real world data and to consider the implications for online conversations specifically. We apply these measures to two topics -- daylight savings time (DST), which serves as a control, and the more politically polarized topic of immigration. We find that the diversity scores for both Fragmentation and Representation are lower for immigration than for DST. Further, we find that while pro-immigrant views receive consistent pushback on the platform, anti-immigrant views largely operate within echo chambers. We observe less severe yet similar patterns for DST. Taken together, Representation and Fragmentation paint a meaningful and important new picture of viewpoint diversity.
translated by 谷歌翻译